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1. INTRODUCTION 


In the practical applications of pattern recognition such as in the classifi- 
cation of remotely sensed multi spectral scanner (MSS) imagery data, it is 
usually difficult to obtain labels for the training patterns. The labels for 
the training patterns are provided by an analyst-interpreter (AI), who 
examines the imagery film and uses ancillary information (such as historical 
information, cropping pract1ce>, and crop calendar models for agricultural 
imagery). Very often these labels are imperfect, and acquiring labels for the 
training patterns is costly. 

In the literature, several researchers investigated the problem of pattern 
recognition with imperfectly labeled patterns. Kashyap (ref. 1) proposed an 
iterative training procedure for a two-class case. Shanmugam and 
Breiphol (ref. 2) developed an error-correcting procedure for disjoint densi- 
ties using Parzen density estimators (refs. 3-6). Chittineni (refs. 7-9) 
investigated the problem of learning with imperfectly labeled patterns and 
studied the applicability of probabilistic distance measures for feature 
selection with imperfectly labeled patterns. Most of these proposed r.chemes 
require the knowledge of probabilities of label imperfections, which usually 
are not available. 

Several scientists considered the problem of estimation of recognition system 
performance (refs. 10-15). Highleyman (ref. 12) investigated the problem of 
estimating the probability of error of a given classifier both for known and 
unknown a priori probabilities. Fukunaga and Kessell (ref. 13) examined the 
problem of estimating the probability of error using unlabeled samples. 

Chow (ref. 14) established a relationship between error and reject rates, 
which is useful in estimating the probability of error using unlabeled 
samples. Chittineni (ref. 15) investigated the problem of estimating recogni- 
tion system performance and probabilities of label imperfections as maximum 
likelihood estimates from the classifier decisions of labeled and unlabeled 
patterns. 



/ 


It is the purpose of this paper to present schemes for estimatinq the 
probabilities of label imperfections and correctinq the labels of mislabelt'd 
patterns with the specified probability that the label correction scheme gives 
a bad label to a pattern. It is assumed that a set of imperfectly labeled 
patterns and a set of unlabeled patterns are given. The proposed schemes use 
Parzen density estimators and both imperfectly labeled and unlabeled pattern 
sets. 


ihe paner is organized in the following manner. Section 2 presents a mode! 
for label imperfections and develops relationships between the densities, with 
and without imperfections in the labels. Section 3 develops a scheme for 
estimating the probability of label imperfections using Parzen density esti- 
mators and presents experimental results in the processing of remotely sensed 
MSS imagery data. Section 4-1 presents a thresholding scheme for the correc- 
tion of pattern mislabels; in section 4-2, a relationship between the probabi- 
lity that the label correction scheme gives a bad label to a pattern and the 
probability that it accepts the original label of a pattern is developed for a 
symmetric mislabeling case. In section 4-3, an example is presented for nor- 
mal distributions having equal a priori probabilities and equal covariance 
matrices to illustrate the behavior of the mislabel correction scheme. 
Conclusions are presented in section 5. In appendix A, a relationship is 
developed between the Bayes probability of errors with and without imperfec- 
tions in the labels for symmetric probabilities of label imperfections. For a 
two-class case, bounds are presented on the probability of error without 
imperfections in the labels in terms of label imperfection probabilities and 
probability of error with imperfections in the labels. These bounds are shown 
to become identities when the imperfections in the labels become symmetric. 

In appendix B, a thresholding scheme is proposed for the correction of 
mislabels when B < b. 



2. A MODEL FOR LABEL IMPERFECTIONS 


Let w and ui' be the perfect and imperfect labels, respectively, each of which 
takes the values 1,2,***,M, where M is the number of classes. Let P(o5 - i) 
and p(Xlu * i) be the a priori probabilities and the class conditional densi- 
ties, respectively, of the patterns in classes u = i. The imperfections in 
the labels are described by the probabilities 

= P(w' = ijo) = j) ; i,j = 1,2,***,M (1) 

where i and j indicate class. We have the constraint 

M 

E B,-,- = 1 (Z) 

i=l J' 


It is assumed that 

p(X|o) = j) = p(Xjo)' = i,03 = j) (3) 

That is, the density of a pattern, given its true label, does not depend on 
its imperfect label. To obtain a relationship between p(X|m = i) and 
p(Xlu' = i), consider 

P(X|“' = i) ° E p(x.»' = i>» = j) 

J T 






= i,tO = j)P(05' 


= ijw = j)p(w = j) 


1 


M 

"ij Z B..P(o) = j)p(X|o) = j) 
Ji 


(4) 


Cross-multiplying and dividing equation (4) by p(X) establishes the relation- 
ship between a posteriori probabilities: 

M 

p(o)' = ijX) = Z = j|X) 

j=l 


Similarly, the a priori probabilities are related as follows. 



( 5 ) 


Invortiiv] oquation '*) yields the followinq result for a two-class case. 


P(w - i)p(Xlo) = i) [0jjP(o)' = i)p(Xlo)' = i) - n j)p(Xia)' = j)] 

i,j = 1,2 

i * j 

Similarly, for the a priori and a posteriori probabilities, 

P(» = i) =i = i) - s,iP(»' = 0)1 ; i,j = 1,2 (P) 

P(*^ = i|X) C0j.j.p(a)' - ilx) - 6 j..p(to' = j|x)l ; i,j = l,2 (10) 

1 ^ J 

For a symmetric case, when 

^11 ^ ^22 " ^ ^12 " ^21 = ^ ( 11 ) 
then A == (23 - 1) (12) 

Prcin equations (7), (11), and (12), 

CP(o) - l)p(X|o) = 1) - P(o) 2)p(X|o) = 2)] = I CP(o)' = l)p(X|o)‘ - 1) 

- P(d)' = 2)p(X|(o' = 2)] (13) 


3. ESTIMATION OF LABEL IMPERrLU'*Ofl PR03ABII ITILS 


In this section, the problem of estimatinq probabilities of label 
imperfections is considered. It is assumed that a set of patterns 
X-j(j)» j = is given with imperfect labels u' = i, i = 1,2, 

and a set of unlabeled patterns Xj, j = 1,2,* • ,N. It is also assumed that 
the a priori probabilities P(o)' = i) of the imperfectly labeled classes are 
available. 

3.1 ESTIMATION OF BAYES PROBABILITY OF ERROR 

The risk incurred by the Bayes classifier is the minimum risk that can he 
achieved. The labeled and unlabeled samples can be used to estimate the Bayes 
probability of error as follows: Let p(X) be the mixture density function. 

That is, 

p(X) = P((j = l)p(Xiu> = 1) + P(w = 2)p(Xiu) = 2) + ••• + P(o5 = M)p(Xlo) = M) 

(14) 

where values for P(o) = i) are the a priori probabilities and those for 
p(X|w = i) are the class conditional densities. The Bayes classifier classi- 
fies a pattern X into a class, the a posteriori probability of which is 

largest. When X is classified according to the Bayes decision rule, the con- 

ditional probability of error is 

r(X) = 1 - maxCp(m = i|X)] (15) 

i 

The Bayes probability of error is tnen given by 

Pg = E[r(X)] = /r(X)p(X)dx (16) 

Thus, if we know r(X) as a function of X, the Bayes probability of error Pg 

can be estimated by the sample mean r(X^) of N test patterns as 

1 N 



( 17 ) 


W is drawn from thn mixtiiro density and the labels of are not 

needed. The ostiwate of eqnaiion (1/) is unbiased; that is. 




nn) 


s^nce 


n < r(X) < 


The variance of is given by 


Var (Pg) = 




N 


p,(l - I'e) 


% 

MN 


H'l) 


(20) 


The variance of P is at least m less than the variance of the error esti- 
® "" PJl - PJ 


mate, based on counting misclassified labeled test patterns 


N 


This 


is because the error count estimate gives a binary quantization of the error 
on the test pattern while r(X) assigns a real value. To use equation (17) in 
estimating the Bayes probability of error, knowledge of the risk function is 
required. The risk function r(X) can be obtained using density estimators for 
class conditional densities. 


3.2 PARZEN ESTIMATE OF r(X) 

Given a sequence of independent, identically distributed, random n-dimensional 
vectors •• ,Xf^ from a distribution with probability density function 

p(X), the Parzen estimate of p(X) is given (refs. 3-6) by 


Pm(X) 


NCh(N)] 


- 


( 21 ) 


With the proper choice of the weighting function h(N) and kernel K(*). Pq(X) 
tends uniformly in probability to p(X). Choosing a normal kernel 







exp 


wl/n T _i 

- ^ (X - y..)h - X.) 


( 22 ) 







where Y. is the sample covariance matrix of the data. The Parzen estimate of 
the conditional error for any X is given by 


rp(X) 


P(m = i)p|^ p{Xl/iu) 
lax 

’Ll , 


3.3 ESTIMATION OF Pj j IN THE MULT IC USS CAS F. 


^11 P 2 I 


P12 P22 *•* *^M 2 


I^PlM «2M PmmJ / 

T V ( 24 ) 

Po)lX = •^P(“ " *I^'«P('‘' = 2|X),.-,p(a* = MIX)]' \ 

Pat'lX “ (^P(“' " llX).P(c>' = 2|X),-..,p(w' = MIX)]’’’ 

/ 

From equations (5) and (24), we obtain 

P(i)lX " ^Pw'lX 

Now the problem of estimating probabilities of label imperfections is for- 
mulated as follows. 


Find: p.., 1 ,j = 1,2,***,M such that Pg is minimized, where 

Pp = 1 - )-ff Z) max[p(ui = 

® r £=1 1 ) 




7 


subject to the constraints 


and 


> j“l>2, 
i=l 

0 < gj. ; i,j = 1,2,. 


* ' * ,M 

..,M 


(27) 


From the given set of imperfectly labeled patterns and unlabeled patterns, 
various quantities in equation (26) are estimated as follows. 

|l/n 

p(Xlo)' = i) = 

■ (28) 

where is the sample covariance matrix of the patterns in the class w' = i. 

n/,,t _ ,• |y\ - P(a3' = i )p(X|til' - i ) 

Plw 1 |x; - p(y. * D'^xioj' = t) + P(U)’ = 2)p(Xto)' ="2) + ... + P(oj' = M)p(Xlo)' 


.-n/2 "i ( wy" T . ) 

E exp - -4- CX - X.(£)]'?;:^[X - X.(£)]i 

i t ' 1^1 I 


(29) 


and p(w = ilX), i = 1,2,..*,M, is obtained from equations (25), (28), and 
(29). The estimates of that minimize Pg subject to the constraints of 
equation (27) can easily be obtained using optimization techniques such as the 
D&vidon-Fletcher-Powell procedure (refs. 16-18). 


3.4 ESTIMATION OF R^ j lN THE TWO-CLASS CASE 

From equation ^i.O), in a two-class case the Bayes risk r(X) becomes 
r(X) = min[p(w = llX),p(o) = 2lX)3 


= -2--Y|p(a)- llX) - p(w= 2jX) 
1 1 


^ ^ 1 ^ 11^22 ■ *^ 12 ^ 21 * 


[(R 22 + Pi2^P(“' " " (<^11 = 2|X)] 


11 '^21' 


_ 1 
" ■? 


4— -j^|2p(„' = IIX) p 


11 ^^22 


22 ■ *^11 ■ ^ 


(30) 


For a two-class case, the problem of estimating may now be formulated as 
follows. 


Find: ^n,^2Z 

such that Pg is minimized, where 


1 

7 




11 


^22 " ^ 


1 

1 


E |[2p(-' 


= l|Xj) 


^ ^22 " (^ 1 ) 


11 


subject to the constraints 


0 < 3ii < 1 ) 
0 < ^22 < 1 ! 


(32) 


The a posteriori probability p(ui' = IjX), can be estimated using 
equations (28) and (29). The probabilities of label imperfections 
3^--j (i = 1,2) that minimize the Pg of equation (31), subject to the con- 
straints of the inequalities in equation (32), can be easily obtained using 
an optimization technique such as that of Davidon-Fletcher-Powel 1 . Experi- 
mental results in processing remotely sensed MSS data are presented in the 
next section. 


3.5 EXPERIMENTAL RESULTS 

In this section, some results are obtained by applying the theory presented in 
the previous sections for estimating the probabilities of label imperfections 
in processing remotely sensed Landsat MSS imagery data. The images are of a 
5- by 6-nautical-mile area called a segment. The image is divided into a rec- 
tangular array of pixels, 117 rows by 196 columns. The images are overlaid 
with a rectangular grid. Two classes are considered: class 1 is wheat, and 
class 2 is "other." The pixels at the grid intersections are labeled by an AI 
using film products of the images and ancillary data such as historic informa- 
tion and crop growth stage models. These labels are imperfect labels. Also, 
ground truth (GT) labels, which are the true labels for these pixels, are 
acquired. Twelve features and 836 unlabeled patterns are used. The numbers 
of imperfectly labeled patterns in each class are listed in table 3-1, and the 
a priori probabilities P(u)' = i) are estimated from the number of imperfectly 
labeled patterns in each class. The Davidon-Fletcher-Powell optimization 
methcd is used to estimate 3^,- by minimizing Pg of equation (31) subject to 
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TABLE 3-1.- COMPARISON OF ESTIMATED LABELING ACCURACIES WITH THE ONES COMPUTED FROM GT LABELS 


Segment 

Description 

No. of AI 
labeled patterns 

Wheat 

"Other" 

1231 

Jackson County, 
Okla. 

71 

25 

1520 

Bigstone County, 
Mont. 

20 

71 


Probabilities of label imperfections P(u)'=ilu=j) 

1 2 ^ 

IH 




Estimated using, 
proposed method 

Computed from comparison 
of AI, and GT labels 


0.9586 

0.0414 

0.1330 

0.8670 


0.8629 

0.1371 

0.0363 

0.9637 


0.9315 

0.0685 

0.1304 

0.8696 


0.7917 

0.2083 

0.0150 

0.9850 

































the constraints of inequalities set out in equation (31). Table 3-1 summa- 
rizes the estimated labeling accuracies using the method proposed in the paper 
and computed labeling accuracies using AI and GT labels. 

From table 3-1, it is seen that the labeling accuracies estimated using the 
proposed method are in reasonable ag; e*;~-;nt with the ones computed from the GT 
I labels. Also, it is to be noted tha , ci’' ‘iough the GT labels of remote sens- 

I ing data are fairly accurate, they ar. ‘C perfect. 
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4. MISLABEL CORRECTION WITH SPECIFIED PROBABILITY OF BAD LABELING 


In this section, the problem of identification and correction of mislabels of 
patterns using unlabeled patterns is considered. In particular, thresholding 
schemes are proposed for the identification and correction of mislabels. A 
relationship is developed between the probability that such a scheme gives a 
bad label to a pattern and the probability that the scheme accepts the 
original label of the pattern. This relationship could be used in computing 
the threshold with a specified probability of bad labeling. It is assumed 
that the probabilities of label imperfections are symmetric. That is. 


and 


P = ; i = 1,2,***,M 

b = B,. ; i,j = 1,2,***,M 

i ^ j 


(33) 


It is also assumed that is estimated using a technique such as the one in 
section 3. The following thresholding scheme is proposed when a > b. The 
case when a < b is treated in appendix B. 


4.1 A THRESHOLDING SCHEME FOR MISLABEL CORRECTION WHEN a > b 

For the identification and correction of mislabels of the patterns when a > b, 
the following scheme is proposed. 

Change the label of X to w = i whenever 

max[p{w' = i|X)] > 1 - t (34) 

i 

where t is some threshold; otherwise, do not change the label of X. That is, 
the label of X is changed whenever there is enough confidence in the thresh- 
olding scheme to change the label. (Section 4.2 discusses the computation of 
t.) Define a random variable U(X), 

U(X) = maxCp(m' = i|X)] (35) 

i 

Let VQQ|_(t) be the region of the feature space over which, for a particular 
threshold t, the original label of pattern X is accepted. That is, 


(3H) 


- rxiu(x) < (1 - 1)] 

Let Pqcl^^) probability that the thresholding scheme will not change 

the label of a pattern at threshold t and the probability that a pattern lies 
in the region Vqql('^)* Using the above label correction scheme, whenever the 
label of a particular pattern X is changed, let PR|_(t) be the probability that 
a bad label will be given to a pattern. The threshold t can be determined by 
specifying the Pp^. A relationship between PpcL^^) which can bt» 

used to compute the threshold t using unlabeled patterns is developed in the 
next section. 

4.2 A RELATIONSHIP BETWEEN Pri ( t) AND Pn r i (t) 

In this section, a relationship is developed between PpL(t) and Pqcl^^^ 
symmetric probabilities of label imperfections when B > b. Suppose that the 
threshold is decreased from t to t - At. Then let the region VQQ|_(t) be 
expanded from to VpQ|_(t - At). At threshold t, the labels of the pat- 

terns in the incremental region AVp^^Ct) are changed; but, at threshold 
t - At, they are not changed. The patterns in the region AVQp[_(t) satisfy the 
relation 

(1 - t)p(X) < maxCP(w' = i)p(X|o)' = i)] < (1 - t + At)p(X) (37) 
i 

Let 4 PQ 0 [_(t) be the increment in the probability pQQL(t). It is also the 
probability that a pattern lies in the region AV[jQ[_{t). That is, 

APpcL(t) = p(X)dx (38) 

^Vdcl(^) 

Let APQ[_(t) and APp|_(t) be the increments of the probability of correct label- 
ing and of the probability of bad labeling, respectively, when the threshold 
is decreased from t to t - At. Because in the increase of region V[)Q|_(t), 
there will be a decrease in the probabilities Pci_(t) and PpL(t). When 3 > b, 
Ap 0 L(t) and APgi_(t) can be written as 

-APp, (t) = / max[P(o) = i)p(X|o) = i)]dx 

^Vdcl(^) i 


( 39 ) 


and 


(40) 



(t) = / 


[1 - max p(w = ilX)]p(X)dx 
i 


In the region AVQQL(t), the probability APQQ|_(t) can be split into two parts: 
(1) the decrease in the probability of correct labeling of a pattern APQL(t) 
and (2) the decrease in the probability of bad labeling of a pattern 
APgL(t). That is, from equations (38), (39), and (40), we obtain 

Consider 


p(u)' = ilX) = 2 = jIx) 

j=l 

= ep(to = i jx) + b p(o) = j|X) 
j=l 
J>i 

= (3 - b)p(t»5 = ilX) + b (42) 

From equation (42), we o^^ain 

maxCp(o) = i|X)] = max[p(o)' = i|X)] - 

Using equations (39) and (43) in equation (40) yields 

/ max[p(co' = iIX)]p(X)dx = (e - b)APg^(t) + 3 ^P[)QL(t) (44) 

AVgcL(t) i 

Therefore, from equations (37) and (44), in the incremental region ^VqqlC^)* 
we have 


- b)*' *'’dCL 


(tl I ** *'’0CL<*> 


(45) 


Summing equation (45), with t steadily decreasing from t to 0, yields the 
following. 


z (1^^ 3 < z (t) 


< Z 


(1 - t - a) 

ir-TT 



(t) + E 


At APpj,^ (t) 

”Tb - TT 


C4t) 


If vfe let At tend to zero, the last sum of the above equation vanishes, 
resulting in 

° h|'-> 'IPpCut^) 

equation (46) shows a relationship between Pqqj (t) and PpLCt). Once the 
densities are estimated from the imperfectly labeled patterns, P[)Q|_(t) can be 
computed from the unlabeled samples. For a specified Pg^j equation (47) can 
be used to compute the threshold t. 


4.3 AN EXAMPLE 


The mislabel correction scheme presented in section 4.1 does not change the 

label of pattern X whenever max p(m' = i|X) < (1 - t). For a two-class case, 

i 

the region of VgQL(t) also can be described as those X-values for which the 
following relation is satisfied: 


P(0)' 

= l)p(X 

0)' 

= 1) 

PtoT 

= ^)P(X| 

0)' 

= 2) 


< 


1 - t 
t 


(48) 


Using equation (4) for a symmetric mislabeling case, equation (48) can be 
written as 


t - (1 - g) 
e - t 


P(0) = l)p(X 

(0=1) 

P(0) = 2)p(Xj 

(0 = 2) 


B - 
t - (1 - B) 


(49) 


For this example, it is assumed that the a priori probabilities are equal and 
the class conditional densities are Gaussian with equal covariance matrices. 
That is, 

P(o) = i) = 0.5 I (50) 

p(X|o) = i) ~ N(M.,E) 

i = 1,2 



Let 

» l)p(Xlo) » 1)1 

v(X) * 



= X^i:"^(M^ - Mg) -|-|M{rr^M^ - mJj:"^M2^ 

(51) 

Let 

s^ » (M^ - Mgl^r^tM^ - Mg) 

(52) 


where s is the Mahal anobis distance between the pattern classes. 


Since v(X) is a linear combination of Gaussian random variables, it is also 
normally distributed. The class conditional densities of v{X) can be written 
as 


p[v(X)U = 1] ~ N(| s^,s^) I 
P[v(X)U = 2] ~ N(- I s^,s^)j 

Then the probabilities computed as follows: 

Let 


(53) 


= log 

't - (1 - b)' 

L B - t 

= 1o9 

r B - 1 

[t - (1 - B), 

(b(Pj = — 

- exp(- ^ F^) 

s li„„r B - t 

2 s^og[-f . (1 - 


and 


u _ s , lT__r B - t 
b - - ^ + -log^-^— 


(54) 


Consider the following. 




Pdcl(^^ " ‘ / ^'tv(X)lm = iMv(X)] + P(U3 = 2) 

{ 


= 0.5 


^2‘ 


f-^ 

^1 T 


<l>{Od€ + 


1 ? 

^2’*' * 


1 2 


<|>(5)d5 


= Me 


j*a 


<l>{e)d? + f ^(^)dP, 
-b 


= * (b) - <t(a) 


/ pCv(X)!o) = ?]drv(X|l 

‘i I 


(55) 


where 0 (a) = f 4 (5 )d?. 
«00 


Similarly, for a two-class case, the probability that the algorithm gives a 
bad label can be written as 


0 < -K- 

fu) = IJp 

iil 

u) = 1) ^ t - (1 - S) 

u < -p- 

= 2)p 


|a) = 2) - 0 - t 


,n 

w = IJ 


■= ^’’’ [ - - nr- H r ? ■ ]} -< = 2] 


= 0.5 


- 1.2 


4(Od5 + 


00 


‘t(5)d5 


= 0.5 
= 4(a) 


f 4(c)d? + r 4(5)d5 

*0 ^a 


( 56 ) 





Figures 4-1 through 4-4 show the plots of P 3 L versus t, Pp^L versus t, 
versus t, and PgL versus Pqcl* >'espectively, for values of B = 0.95, 0.91, 
0.85, and 0.81 and for values of s = 1, 2, 3, and 4. The tip of the arrow 
points to the direction of increase in the value of 3 . From figure 4-1, it is 
seen that for a specified Pp^ the threshold t increases with the increase in 
the probability of imperfections in the labels or with the decrease in the 
value of 3 . 









Figure 4-4.- Trade-off between Pg|_ and P^ql* 


4>W-^ 


5. CONCLUSIONS 


In the practical applications of pattern recognition, obtaining labels for the 
training patterns is expensive, and very often these labels are imperfect. 
Schemes are presented in this paper for the estimation of probabilities of 
label imperfections and correction of mislabels. 

The risk incurred by the Bayes classifier is the minimum risk that can be 
achieved. The conditional risk r(X) can be obtained as a function of X, using 
estimated densities from the labeled patterns. The probability of error can 
be estimated as an average value of r(X) over the unlabeled patterns. The 
resulting estimated probability of error has less variance when compared to 
the variance of the error estimate based on counting the misclassified labeled 
test set. Using the relationships between the probability densities with and 
without imperfections in the labels, the problem of estimating the probabili- 
ties of label imperfections is formulated for the two-class and multi cl ass 
cases as that of minimizing the Bayes probability of error v/ith probability 
constraints. Optimization techniques, such as the Davidon-Fletcher-Powell 
procedure, can be used to estimate the probabilities of label imperfections. 
Experimental results from processing remotely sensed MSS imagery data are pre- 
sented. The estimated probabilities of label imperfections using the proposed 
method and the probabilities of label imperfections computed using the imper- 
fect (AI) and the GT labels are in good agreement. 

Thresholding schemes are proposed for correcting mislabels of the patterns. 
Whenever there is enough confidence in the scheme (as determined by the 
threshold), the correct label of the pattern is determined. A relationship 
between the probability that such a scheme will give a bad label to a pattern 
and the probability that the scheme will accept the original label of the pat- 
tern is developed for a symmetric mislabeling case. This relationship could 
be used to compute the threshold from the relatively inexpensive unlabeled 
patterns, for a specified probability of bad labeling. 




An example is presented for Gaussian distributions with equal covariance 
matrices and equal a priori probabilities. This illustrates the behavior of 
the probability that the scheme gives a bad label, the probability that the 
scheme gives a correct labeU and the probability that the scheme accepts the 
original label. All are functions of the threshold, of various probabilities 
of label imperfections, and of different Mahalanobis distances between the 
classes. The trade of curves between P3L and Pqq|_ are presented for this 
example. 

For a two-class case, bounds are presented between the Bayes probabilities of 
error with and without imperfections in the labels. Furthermore, it is shown 
that these bounds become identical when the imperfections in the labels become 
symmetric. 
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APPENDIX A 


BAYES ERROR PROBABILITIES WITH AND WITHOUT 
IMPERFECTIONS IN THE LABELS 


The Bayes risk tn classifying a pattern X can be written as 

r(X) = 1 - maxCp(a) = i|X)] (A-1) 

i 

and r'(X) = 1 - max[p(w' = ilX)] (A-2) 

i 

where r(X) is the conditional error with the densities without label imperfec- 
tions and r'(X) is the conditional risk with imperfections in the labels. The 
probability of errors can be written as 

P, = [r(X)] (A-3) 

an(S p; = Ep,x, [r'(X)] (A-4) 

Where E is the expectation operator. For symmetric probabilities of label 
imperfections of equation (33), theorem A-1 gives the relationship between the 
error probabilities P^ and P'. 

t? 0 


Theorem A-1 : If the probabilities of imperfections in the labels are symmet- 
ric, as given in equation (33) and f? > b, then the Bayes probability of error 
with and without imperfections in the labels are related as 

P* = (p _ b)Pg + (1 - p) (A-5) 


Proof: From equations (5), (33), (A-1), and (A-2), we obtain 


r' (x) = 1 - max 


M 


i J I 


1 u=i ^ 

=!-{(«- b)[l - r(X)] + b) 


= 1 - {(B - b)max[p(w = ilX)] + b> 
i 




= (p - b)r(X) + (1 - p) 


(A-6) 


Taking exceptions on both sides of equation (A-6) yields equation (A-5). 

for two-class symmetric probabilities of label imperfections, the error 
probabilities are related as 

= (23 - l)Pg + (1 - 3) (A-7) 

If the label imperfection probabilities are not symmetric, the Bayes errors 
depend on the particular probability density functions of the patterns. How- 
ever, for a two-class case, the following bounds are obtained between Pg and 
Pg and are shown to be an identity of equation (A-7) when the imperfections 
in the labels become symmetric. 

A.l LOWER BOUND ON Pg 
Case (a) ; 3^^^ > ^22 

Consider the case when 3^]^ > 322* equation (8), we obtain 

CP(o) = l)p(Xlo) = 1) - P(o) = 2)p{X|o) = 2)] 

= F C(S22 + Bi2 )P(»' = l)p(X|o.' = 1) - (6jj + S2i)P(«>' = 2)p(X|m' = 2)] 


= 

aj[P((D ' 

1 

1 — 1 
11 

3 

X 

O- 

II 

P(u)' = 2)p(X|(o' 

= 2)] 

- a2P(w' 

= 2)p(X|o)' = 2) 








(A-8) 

where 



-^11 
“1 - Sii 

+ 3op + 1 

1 










(A-9) 

and 



2(3ii - 322 ) 1 

»2 = frrK#T>oJ 




Def i ne 

the regions 

and ^2 as 






fi = 

{X|P(o)' 

= l)p(Xh‘ 

= 1) > P(o)' = 2 )p(X|o3' 

= 2)} 

(A-10) 

and 

n 2 " 

{X|P(o)' 

3 

X 

Q. 

II 

= 1) < p(o)' = : 

2)p(Xh' 

' = 2)} 

(A-11) 


Let and subsets of region ftp where 

ftp = {X|a^CP(u* = l)p(X|w' = 1) - P(o)' = 2)p{Xlo)' = 2)] 
> a2P(o)' = 2)p(Xlu>' = 2)} 

and ftp = {X|a^[P(a)‘ = l)p(X!o)' = 1) - P(u)' = 2)p(X|u)' = 2)] 

> OgPCw' = 2)p(xlo)' = 2)} 

otl _ 1 + 022 ' ^11 ^ 1 
“3 “ + ttg 1 + $11 - 022 ^ 

In terms of 03 , the regions ftp and ftj ^2 become 

ftp = {Xla 3 P(u)' = l)p(Xl(o' = 1) > P(o)' = 2)p{X|o3' = 2)} 
and ftp = {X!a 3 P((o' = l)p(Xlo)' = 1) < P(o)' = 2)p(X|o)' = 2)} 


(A-12) 

(A-13) 

(A-14) 

(A-15) 

(A-16) 


From equations (A-8) through (A-16), equation (A-17) is obtained. 


I 


|P((D = l)p(Xlo) = 1) - P(u) = 2)p(X|u) = 2)|dx 

= a, / lP(o)' = l)p(X|w' = 1) - P(iu' = 2)p(Xlio' = 2)|dx + c<2 ^ 

«2 "2 

+ a. / lP(oi' = l)p(Xlo)' = 1) - P(«' = 2)p(Xlu' = 2)|dx - 02 ^ 
«11 «11 

- a, / |P((o' = l)p(Xlm‘ = 1) - P(<o' = 2)p(Xlu)' = 2)ldx + 02 / 

«12 "12 

= o^ / lP(o)' = l)p(X|oj' = 1) -P(u‘ = 2)p(X|uj' = 2)|dx + 02 P(u' = 2) 

- 2a, / lP(ai' = l)p(Xlo)' = 1) - P(o)‘ = 2)p(X|o)' = 2)|dx - 2o2P( 


' = 2)p(X|u)' = 2)dx 
P(o)' = 2)p(X|u' = 2)dx 
P(u)' = 2)p(Xlai' = 2)dx 

u' = 2) / p(Xlu)' = l)dx 
^11 


(A-17) 


The reqlons and are illustrated in figure A-1, For a two-class case, 
from equations (A-1) through (A-4), the following relationships are developed. 


Pe = I “ I / lP(u> = l)p(Xlw = 1) - P(w » 2)p(X!u = 2!dx (A-18) 

and 

P; =|“7/ = l)p(X|w' = 1) - P(u' = 2)p(X|w' = 2)ldx (A-19) 


Using equations (A-18) and (A-19) in equation (A-17) yields the following. 


'’e = 2) +t. 2 P(«.' = 2) J p(X|»' - 2)dx 

^11 


+ a, / 1P(0)' = l)p(X|o)' = 1) - P(w' = 2)p(X((0' = 2)|dx 

* r\ 


‘12 

1 I (-&H + ^22 ^^11 “ ®22^ 


^ 7 “ 7 (p ^ + ^22 " ^ 5 


11 ^22 


TT 


P(o)' = 1) + 


(-&n + f3«p + 1) 


’ll " ^22 
^^11 ^22 


TT'^e 


(A-21) 

When the imperfections in the labels become symmetric, it is easily seen that 
02 = 0 and the region becomes the null set. The inequality of equation 
(A-21) then becomes equal and is identical to equation (A-7). 


Case (b): ^ ®22 

Consider the case when 3 


and 


11 ^ ^22* 


Let 


a. 


^11 " ^22 ^ 
^11 ^22 " ^ 


“2 


2(3g2 - &ii) 

3ii + 022 " ^ 


(A-22) 
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Proceeding as before, we obtain 




+ ot, / |P(w' “ l)p(Xlto' ■ 1) - P(ui' * 2)p(X|o)' » 2)|dx 

^ r » 


0 


22 


+ apPCto' « 1) / p(Xlai' = l)dx 
^21 

(Pii " 3op + 1) ) 


1 ^^11 


:: 7 " 7 - 1) " (0n + P 


22 


’ll 


’22 


TT 


P(o)' = 1) + 


(^11 “ 3^9 


22 


Tp 


11 ^22 


Pi 


(A-23) 


The regions figx ^22 illustrated in figure A-2. 

When the imperfections in the labels are symmetric, it is easily seen that 
a£ = 0, and the region ^22 becomes the null set. The inequality in equation 
(A-23) then becomes equal and is identical to equation (A-8). 


A.2 UPPERBOUND ON P„ 


Case (a) ; > P 22 

Let 

and 


“1 “ (^^22 ~ 

“2 "" ^^^11" ^22^ 


(A-24) 


Proceeding in a manner similar to case (a) of section A.l, the probability of 
errors Pg an P^ are related as follows. 







(A-25) 


21 

- / lP(w » l)p{Xlo) = 1) - P(&) = 2)p(X|to = 2)|dx 
n oo 


, 1 1 . ■ ®22' 
5 7 - 2 ( 2^22 - 1 ) (2622 ‘ 


P(io = 1) 


^ (2622 - 


The regions ^21 ^22 illustrated in figure A-3. 


When the imperfections in the labels are symmetric, it is easily seen 
that Og “ 0, and the region ^22 becomes null. The inequality of equation 
(A-25) becomes equal and is identical to equation (A-8). 


Case (b) ; 

Let 

and 


“1 


“2 * 



1 


2(^22 ~ ^ii) 


i 


(A-26) 


Proceeding in a manner similar to case (a) of section A.l, the probability of 
errors Pg and P^ are related as 




= 2) + = P(X|» = 2)dx 

X \ L f Xi *11 

- / |P(ai = l)p(X|o) = 1) - P(to = 2)p(X|oj = 2)jdx 

”12 


(^22 “ ^11^ 


Pi 


^ 7 - 7TT6jf-"l) * T2 Sii-- 1) - 1) 

The regions and ^re illustratated in figure A-4. 


(A-27) 


When the imperfections in the labels become symmetric, it is easily seen 
that 0-2 = 0, and the region n ^2 becomes null. The inequality of equation 
(A-27) becomes equal and is identical to equation (A-8). 
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APPENDIX B 

A THRESHOLDING SCHEME FOR THE CORRECTION 
OF MISLABELS WHEN B < b 

When B < b, the following scheme is proposed for identifying mislabeled 
patterns with symmetric probabilities of label imperfections, as given in 
equation (33). 

Change the label of X to ai = i whenever 

min[p(o)' = i|X)] < 1 - t (B-1) 

i 

where t is some threshold; otherwise, do not change the label of X. For this 
scheme, a relationship between Pbl(^) f’DCL^^^ obtained in the following 
and is shown to be equivalent to equation (47). 

Let U(X) = min[p(ai’ = iiX)] (B-2) 

i 

and VpcL^^^ " [X|U(X) > 1 - t] (B-3) 

Suppose that the threshold t is decreased from t to t - At. Let AV[)QL(t) be 
the decremental region of VQQ|_(t) - VQQi_(t - At). For patterns in the region 
aVdcl(^)> we have 

(1 - t)p(X) < min[P((i)' = i)p(X|w' = i)] < (i - t + At)p(X) (B-4) 

i 


Let APpj,|^(t), AP^j(t), and APg|^(t) be the increments in the probabilities 
Pdcl^^^> PcL(t)> and PB[_(t), respectively, because of the decrease in the 
threshold from t to t - At. Then, we have 



-ApDCL^t) 

= f p(X)dx 

(B-5) 


AP.,(t) = f 

AVpcL^t) 

max[P((o = i)p(Xlo) = i)]dx 
i 

(B-6) 

and 

AP„|(t) = f 

AVDCL(t) 

{1 - max[p(o) = ilX)]}p(X)dx 
i 

(B-7) 
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(B-8) 


From equations (B-5) through (B-7), we get 

-APDci_(t) = APQj_(t) + APg|_(t) 

When p < b, from equation (42), we obtain 

-max[p{u) = ilX)3 = niiii[p(a)’ = ilX)] 

Using equations (B-5) and (B-9) in equation (B-7) yields 


b 

l b - py 


(B-9) 


(b - p)APg^(t) = P APggJt) + 


f mi 
aVdcl^*) i 


n[P(m' = i)p(Xlu)' = i)]dx 


(B-10) 


From equations (B-4) and (B-10), in the decremental region AVgg^(t), we have 


(t-l + B).p ^ rp - 1 + P) 

— (b“- p) - lb - P) (b - P) 


(B-li) 

Summing equation (B-11), with t steadily decreasing from t to zero, and 
letting At tend to zero results in 

It is seen that equation (B-12) is identical to equation (47). 
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